My interest in maritime industries led me to explore data from Global Fishing Watch, an international non-profit organization that provides open source data on global fishing activity. Through their data portal I discovered a data set that tracks the locations of two longline fishing vessels throughout their fishing excursions; at each point it was marked whether the vessel was determined to be fishing or not fishing. Finding this dataset sparked my curiosity as to what other things were going on at each point and whether there were patterns between other variables and fishing activity.
Because my data set consisted of two separate vessels embarking on fishing excursions in two different regions of the world, I decided to frame my investigation by separately investigating the full extent of the data, the vessel 1 observations and the vessel 2 observations. This approach resulted in me asking the following three research questions:
The data used in my analysis are descriptive vessel tracking information from Global Fishing Watch that I supplemented with net primary productivity data from a SESYNC shiny app. Both components are briefly described below and fully described in my project documentation.
The raw longline data had 65,499 observations and 11 variables when downloaded from Global Fishing Watch.
# Longline data
str(longline_full)
## 'data.frame': 65499 obs. of 11 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ mmsi : num 1.26e+13 1.26e+13 1.26e+13 1.26e+13 1.26e+13 ...
## $ timestamp : int 1327136504 1327136605 1327136734 1327143281 1327143341 1327143411 1327146440 1327149860 1327149911 1327156390 ...
## $ distance_from_shore: num 232994 233994 233994 233994 233996 ...
## $ distance_from_port : num 311749 312410 312410 315417 316173 ...
## $ speed : num 8.2 7.3 6.8 6.9 6.1 ...
## $ course : num 230 238 239 252 231 ...
## $ lat : num 14.9 14.9 14.9 14.8 14.8 ...
## $ lon : num -26.9 -26.9 -26.9 -26.9 -26.9 ...
## $ is_fishing : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ source : Factor w/ 1 level "dalhousie_longliner": 1 1 1 1 1 1 1 1 1 1 ...
# fishing activity counts
x <- count(longline_full, longline_full$is_fishing == 0)
y <- count(longline_full, longline_full$is_fishing == 1)
z <- count(longline_full, longline_full$is_fishing == -1)
print(x, "1,397 'not fishing' statuses (2.13% not fishing)")
## # A tibble: 2 x 2
## `longline_full$is_fishing == 0` n
## <lgl> <int>
## 1 FALSE 64102
## 2 TRUE 1397
print(y, "2,792 'fishing' statuses (4.26% fishing)")
## # A tibble: 2 x 2
## `longline_full$is_fishing == 1` n
## <lgl> <int>
## 1 FALSE 62707
## 2 TRUE 2792
print(z, "61,310 'no data' statuses (93.6% unknown -- eliminate these)")
## # A tibble: 2 x 2
## `longline_full$is_fishing == -1` n
## <lgl> <int>
## 1 FALSE 4189
## 2 TRUE 61310
# narrow data to only include instances of 'fishing' and 'not fishing'
longline_fishing <- filter(longline_full, is_fishing %in% c(0, 1))
## [1] 1.263956e+13 5.139444e+13
## [1] Vessel 1 Vessel 2
## Levels: Vessel 1 Vessel 2
## 'data.frame': 4189 obs. of 10 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MMSI : Factor w/ 2 levels "Vessel 1","Vessel 2": 1 1 1 1 1 1 1 1 1 1 ...
## $ Date : Date, format: "2012-06-02" "2012-06-19" ...
## $ Latitude : Factor w/ 4049 levels "00:00:38","00:01:02",..: 2240 719 570 290 419 2165 2437 532 415 688 ...
## $ Longitude : num 18.6 18.8 19.3 18.9 19.1 ...
## $ Vessel_Speed : num -17.2 -19.5 -17.3 -17.3 -17.1 ...
## $ Distance_From_Shore: num 8.2 5 0.7 4.2 7 ...
## $ NPP_Mean : num 111123 329079 86831 98881 74248 ...
## $ Fishing_Activity : Factor w/ 914 levels "261.947729292956",..: 896 830 902 893 912 851 844 711 698 663 ...
## $ NA : int 1 1 1 1 1 1 1 0 0 1 ...
Entity and Attribute information for processed data set:
| Data Field | Definition | Units | Source |
|---|---|---|---|
| ID | Unique identifier for observation | NA | GFW |
| MMSI | Unique identifier for vessel | NA | GFW |
| Date | Date of observation | YYYY-MM-DD | GFW |
| Latitude | Latitude coordinate of observation | Decimal Degrees | GFW |
| Longitude | Longitude coordinate of observation | Decimal Degrees | GFW |
| Vessel_Speed | Speed of vessel at observed point | Knots | GFW |
| Distance_From_Shore | Distance vessel is observed from shore | Meters | GFW |
| NPP_Mean | Mean net primary productivity value | mg C/m2 day | SESYNC |
| Fishing_Activity | Indication of whether observed vessel is determined to be fishing (1) or not fishing (0) based on GFW algorithms | NA | GFW |
Exploring the data in space revealed that the two tracked vessels were fishing in two different parts of the world.
<This map shows both vessel 1 observations (in the east Atlantic between Spain and Africa) and vessel 2 observations (in the eastern Pacific between Washington state and Alaska).>Here the extent is narrowed to show just vessel 1 observations.
<This map shows vessel 1 observations distinguished by the presence or ansence of fishing activity. Yellow signifies points where the vessel was determined to be fishing while purple signifies points where it was not.>Here is an exploratory view of the additional data included for each observation point.
Here the extent is narrowed to show just vessel 1 observations.
<This map shows vessel 2 observations distinguished by the fishing activity, yellow signifies fishing while purple signifies not fishing.>Here is an exploratory view of the additional data included for each observation point.